Activity 1 - PALMER PENGUINS DATA ANALYTICS


Analyst: Jessie O. Mompero Jr

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import folium
from folium.plugins import HeatMap

DATABASE

In [2]:
chicago_df = pd.read_csv('datasets\\chicago_2001_present.csv')

FILLING UP NULL VALUES

In [3]:
chicago_df['Location Description'] = chicago_df['Location Description'].fillna('unaccounted')
chicago_df['District'] = chicago_df['District'].fillna(chicago_df['District'].mean())
chicago_df['Ward'] = chicago_df['Ward'].fillna('unaccounted')
chicago_df['Community Area'] = chicago_df['Community Area'].fillna('unaccounted')
chicago_df['X Coordinate'] = chicago_df['X Coordinate'].fillna('unaccounted')
chicago_df['Y Coordinate'] = chicago_df['Y Coordinate'].fillna('unaccounted')
chicago_df['Location'] = chicago_df['Location'].fillna('unaccounted')
chicago_df = chicago_df.dropna(subset=['Latitude', 'Longitude'])
chicago_df.isnull().sum()
Out[3]:
ID                      0
Case Number             0
Date                    0
Block                   0
IUCR                    0
Primary Type            0
Description             0
Location Description    0
Arrest                  0
Domestic                0
Beat                    0
District                0
Ward                    0
Community Area          0
FBI Code                0
X Coordinate            0
Y Coordinate            0
Year                    0
Updated On              0
Latitude                0
Longitude               0
Location                0
dtype: int64

DATA TYPES

In [4]:
chicago_df.dtypes
Out[4]:
ID                        int64
Case Number              object
Date                     object
Block                    object
IUCR                     object
Primary Type             object
Description              object
Location Description     object
Arrest                     bool
Domestic                   bool
Beat                      int64
District                float64
Ward                     object
Community Area           object
FBI Code                 object
X Coordinate             object
Y Coordinate             object
Year                      int64
Updated On               object
Latitude                float64
Longitude               float64
Location                 object
dtype: object

Q1 : YEAR 2001 ANALYSIS

In [5]:
chicago_2001 = chicago_df[chicago_df['Year'] == 2001]
loc_counts = chicago_2001['Primary Type'].value_counts().head(10)
plt.figure(figsize=(10,5))
sns.barplot(x=loc_counts.index, y=loc_counts.values, palette='magma')
plt.title('Top 10 Primary Crime in 2001')
plt.xticks(rotation=45, ha='right')
plt.xlabel('Primary Type')
plt.ylabel('Number of Incidents')
plt.show()
No description has been provided for this image

Insight No 1

This is the Top 10 Crime in 2001, and theft is clearly number one with nearly 100,000 records, which is surprisingly high. I think widespread poverty and limited economic opportunities pushed many people toward quick, low‑risk ways to make money back then. That sharp lead for theft suggests policy focus should have prioritized social support and targeted prevention to address the root causes rather than only increasing enforcement.

YEAR 2001 HEATMAP

In [6]:
locations = list(zip(chicago_2001['Latitude'], chicago_2001['Longitude']))
m = folium.Map(location=[chicago_2001['Latitude'].mean(), chicago_2001['Longitude'].mean()], zoom_start=10)
HeatMap(locations).add_to(m)
m.save('insight_1.html')
m  
Out[6]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Q2: YEAR 2024 ANALYSIS

In [7]:
chicago_2024 = chicago_df[chicago_df['Year'] == 2024]
loc_counts = chicago_2024['Primary Type'].value_counts().head(10)
plt.figure(figsize=(10,5))
sns.barplot(x=loc_counts.index, y=loc_counts.values, palette='magma')
plt.title('Top 10 Primary Crime in 2024')
plt.xticks(rotation=45, ha='right')
plt.xlabel('Primary Type')
plt.ylabel('Number of Incidents')
plt.show()
No description has been provided for this image

Insight No 2

This is the Top 10 Crime in 2024, and theft remains clearly number one with a large lead over other offenses, which is concerning because it shows persistent property‑crime pressure in the city. I think economic stress, opportunistic targets, and gaps in property security contributed to theft’s dominance, while battery and criminal damage being high suggests public‑space conflicts and vandalism were also major problems. The data imply policy should balance enforcement with prevention by improving street and property security, expanding social supports, and targeting hotspots where theft and battery cluster.

YEAR 2024 HEATMAP

In [8]:
locations = list(zip(chicago_2024['Latitude'], chicago_2024['Longitude']))
m = folium.Map(location=[chicago_2024['Latitude'].mean(), chicago_2024['Longitude'].mean()], zoom_start=10)
HeatMap(locations).add_to(m)
m.save('insight_2.html')
m  
Out[8]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Q3: YEAR 2001 KIDNAPPING ANALYSIS

In [9]:
chicago_df_2001 = chicago_df[chicago_df['Year'] == 2001]
kidnapping_df = chicago_df_2001[chicago_df_2001['Primary Type'] == 'KIDNAPPING']

print("Total kidnapping rows in 2001:", len(kidnapping_df))

loc_counts = kidnapping_df['Location Description'].value_counts().head(10)

plt.figure(figsize=(10,5))
sns.barplot(x=loc_counts.index, y=loc_counts.values, palette='magma')
plt.title('Top Locations for KIDNAPPING in 2001')
plt.xticks(rotation=45, ha='right')
plt.xlabel('Location Description')
plt.ylabel('Number of Kidnapping Incidents')
plt.show()
Total kidnapping rows in 2001: 924
No description has been provided for this image

Insight No 3

I can see from the year 2001 that kidnapping incidents mostly at residence which surprise me because I thought that our houses are the most secure place. I think that era play a crucial role here because I think that criminal are more notorious back then. Houses usually consist of only few people and so kidnapping at residence back then must be easy compared to crowded spaces.

Insight No 4

School, public and building are low, one of the factor could be that there are many people and you cannot kidnap someone when there is so many people comparing to few people at residence. It could be maybe that this locations are more guarded with officials compared to residence which only consist of mostly family member

In [10]:
chicago_df_2001 = chicago_df[chicago_df['Year'] == 2001]
arrest_counts = chicago_df_2001['Arrest'].value_counts()

plt.pie(arrest_counts,
        labels = arrest_counts.index,
        autopct = '%1.1f%%',
        startangle = 90)
plt.gcf().set_size_inches(10,7)
plt.title('Kidnapping Arrest Distribution 2001')
plt.show()
No description has been provided for this image

Insight No 5

I can see that Kidnap Arrest Distribution during 2001. Seeing 70.8% of kidnapping reports with no arrest makes me uneasy because it suggests most victims don’t see immediate justice. That gap feels like evidence of investigative limits, delayed reporting, or cases happening in ways that make identifying suspects hard. I’d want universities and local groups to push for faster reporting, better coordination with police, and more accessible support so victims aren’t left without follow-up.

YEAR 2001 KIDNAPPING INCIDENT HEATMAP

In [11]:
locations = list(zip(kidnapping_df['Latitude'], kidnapping_df['Longitude']))
m = folium.Map(location=[kidnapping_df['Latitude'].mean(), kidnapping_df['Longitude'].mean()], zoom_start=10)
HeatMap(locations).add_to(m)
m.save('insight_3-5.html')
m  
Out[11]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Q4: YEAR 2024 KIDNAPPING ANALYSIS

In [12]:
chicago_df_2024 = chicago_df[chicago_df['Year'] == 2024]
kidnapping_df_2024 = chicago_df_2024[chicago_df_2024['Primary Type'] == 'KIDNAPPING']

print("Total kidnapping rows in 2024:", len(kidnapping_df_2024))

loc_counts = kidnapping_df_2024['Location Description'].value_counts().head(10)

plt.figure(figsize=(10,5))
sns.barplot(x=loc_counts.index, y=loc_counts.values, palette='magma')
plt.title('Top Locations for KIDNAPPING in 2024')
plt.xticks(rotation=45, ha='right')
plt.xlabel('Location Description')
plt.ylabel('Number of Kidnapping Incidents')
plt.show()
Total kidnapping rows in 2024: 95
No description has been provided for this image

Insight No 6

This is the year 2024 and if I compare this to year 2001, there are some changes and the most noticable is the decrease in kidnapping with 95 record. We can see now that the top location for kidnapping is now on the street. This is surpring because there is always so many people on the street, so how can someone be kidnap, then I think that it could be mostly at night when only a few people are active. These kidnappers could be a sindikato because if they can pull it.

Insight No 7

I also think that people are more dangerous now because we can see that there is a case of kinapping at Church/Synagoguie/Place of Worship, although it is low, that is still a case. These people could be a paid kidnapper because most people that go to these places are good people and they might be waiting to ambush their target

In [13]:
chicago_df_2024 = chicago_df[chicago_df['Year'] == 2024]
arrest_counts = chicago_df_2024['Arrest'].value_counts()

plt.pie(arrest_counts,
        labels = arrest_counts.index,
        autopct = '%1.1f%%',
        startangle = 90,)
plt.gcf().set_size_inches(10,7)
plt.title('Kidnapping Arrest Distribution 2024')
plt.show()
No description has been provided for this image

Insight No 8

This is the year 2024 with 86.2% and its still the same like in 2001, many suspects or kidnappers dont get caught. Just thinking that these guys are roaming freely is so alarming. I think that government need to invest on cctv around their city to maximize the security of their city and their people. It could be that they have a back plan to escape

YEAR 2024 KIDNAPPING INCIDENT HEATMAP

In [14]:
locations = list(zip(kidnapping_df_2024['Latitude'], kidnapping_df_2024['Longitude']))
m = folium.Map(location=[kidnapping_df_2024['Latitude'].mean(), kidnapping_df_2024['Longitude'].mean()], zoom_start=10)
HeatMap(locations).add_to(m)
m.save('insight_6-8.html')
m  
Out[14]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Q5: ARSON ANALYSIS

In [15]:
arson_df = chicago_df[chicago_df['Primary Type'] == 'ARSON'].copy()

year_counts = arson_df['Year'].value_counts().reset_index()
year_counts.columns = ['Year', 'Count']
year_counts = year_counts.sort_values('Year')

plt.figure(figsize=(10,5))
sns.barplot(data=year_counts, x='Year', y='Count', palette='magma')
plt.xlabel('Year')
plt.ylabel('Number of ARSON incidents')
plt.title('ARSON count by Year')
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

Insight No 9

I can see that the peak year for ARSON crime is early 2000s and peak at 2001. I think that reason for this could be that the use for wood on these years contribute to being target by arson because wood are weak to fire, so arsonist are more active during these year. The crime continue to decline even last year 2024, one of the reason could be that there is a strong consequence for doing this crime in Chicago

In [16]:
arson_2024 = arson_df[arson_df['Year'] == 2001]
arrest_counts = arson_2024['Arrest'].value_counts()

plt.pie(arrest_counts,
        labels = arrest_counts.index,
        autopct = '%1.1f%%',
        startangle = 90,)
plt.gcf().set_size_inches(10,7)
plt.title('Arson Arrest Distribution 2024')
plt.show()
No description has been provided for this image

Insight No 10

Seeing 82.1% False and only 17.9% True makes me uneasy because it means almost no reported arson cases led to an arrest. That gap suggests investigations are struggling with evidence, delayed reporting, or resource limits rather than incidents being unimportant. I’d want better rapid-response coordination, improved scene preservation and CCTV coverage, and clearer community reporting channels so victims and witnesses can help investigations succeed.

YEAR 2001-2025 ARSON HEATMAP

In [17]:
locations = list(zip(arson_df['Latitude'], arson_df['Longitude']))
m = folium.Map(location=[arson_df['Latitude'].mean(), arson_df['Longitude'].mean()], zoom_start=10)
HeatMap(locations).add_to(m)
m.save('insight_9-10.html')
m
Out[17]:
Make this Notebook Trusted to load map: File -> Trust Notebook